perm filename 106A40[1,RWF] blob
sn#732910 filedate 1984-03-30 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00010 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 FILES
C00008 00003 Text Files
C00023 00004 File Cliches
C00026 00005 Another alternate method uses a sentinel in the file to mark the end of
C00029 00006 Example: a program to test that every line of file f contains at least
C00030 00007 Example: Program to read a text file, and left-justify it with as many
C00033 00008 Pseudo-files
C00037 00009 Terminal Input Cliches
C00039 00010 The Naming of Files
C00042 ENDMK
C⊗;
FILES
by
Robert W. Floyd
Copyright 1983
A file is a sequence of characters or data. Because it is stored in
magnetic disc memory, it is permanent and there is room for it to be very
long, but (unlike an array) it can only be created or used in a fixed
left-to-right order. A file may be read or written by a program. At any
given moment, a file can be in one of three conditions (called modes):
closed, open for reading, open for writing. There is a family of
operations, called input operations, which can only be done on a file that
is open for reading. The input operations bring information from the file
into the program. There is another family of operations, called output
operations, which can only be done on a file that is open for writing.
Opening a file is a bit like checking a book out of the library; nobody
else can use the file while you have it open, and it must eventually be
closed again.
A file in Pascal has two names; an internal name, by which it is called
inside the Pascal program, and an external name, by which it is stored in
the computer's directory, and used in editing and other executive
operations.
Usually, a Pascal program allows the user to say what external name
corresponds to each internal name; this allows a single Pascal program to
be applied to many different files. The relation between internal and
external names is closely analogous to the relation between parameters and
arguments.
In the definitions and examples which follow, we shall use ININT as
exemplifying the internal name of a file which has been opened for
reading, OUTINT as the external name of a file which has been opened for
writing, and INTNAME as a typical internal name of a file in any mode. We
shall use INEXT, OUTEXT, AND EXTNAME in the same way as the corresponding
external names. We shall use I, R, C, and B as typical variables of types
INTEGER, REAL, CHAR, and BOOLEAN.
If a certain file is a sequence of elements e↓1 e↓2 e↓3---e↓n, and is open
for reading, there is always a mark, akin to a bookmark, separating the
part of the file that has already been read from the part that has not.
In our examples, we use the @ symbol to show this mark (the actual mark is
not visible) and to show that the file is open for reading. When the file
is first opened for reading, it looks like @e↓1 e↓2 e↓3---e↓n. Eventually
all the elements of the file have been read; the file looks like e↓1 e↓2
e↓3---e↓n@, and this is called the end-of-file condition for that file.
We call the position of the @ symbol the read pointer. If a file is open
for writing, we show this by including a similar write pointer, #. All
writing is done at the right end of the file, so a file open for writing
initially looks like just #, and later looks like e↓1 e↓2 e↓3---e↓n#.
There are files of integers, reals, Boolean values (true or false, bits),
and characters. There are also files of characters separated into lines,
called text files. Because most Pascal file operations are done on text
files, we treat them in detail first.
Text Files
A text file is a sequence of (zero or more) lines, where a line is a
sequence of (zero or more) characters followed by a carriage return symbol.
Assume that OUTINT is the internal name of a text file that has been
opened for writing. The command
WRITE(OUTINT,X)
can be used, when X is an expression of type REAL, INTEGER, CHAR, or
BOOLEAN, to append the value of X, expressed by a string of characters, to
the end of the file OUTINT. If the file initially is ABC#, after
WRITE(OUTINT,13) it becomes ABC__________13#. After WRITE(OUTINT,13.0) it
becomes ABC_1.3000000E+01#. After WRITE(OUTINT,`D') it becomes ABCD#.
After WRITE(OUTINT,3>5) it becomes ABCFALSE#. (For details of the format
of numbers written on a text file, see __________). The command
WRITELN(OUTINT) is used to end a line on a text file open for writing; it
changes ABC# to ABC↓#. Several WRITEs of variables to the same file can
be combined; WRITE (OUTINT,X1,X2,X3) means the same as
WRITE(OUTINT,X1);
WRITE(OUTINT,X2);
WRITE(OUTINT,X3)
and WRITELN(OUTINT,X1,X2) means the same as
WRITE(OUTINT,X1);
WRITE(OUTINT,X2);
WRITELN(OUTINT).
If the text file to which writing is being done has the internal name
OUTPUT, the WRITE and WRITELN commands above can be abbreviated to just
WRITE(X1,X2,X3) and WRITELN(X1,X2); the command WRITELN (OUTPUT) can be
abbreviated to WRITELN. We say that OUTPUT is the /default/ file for
writing. Most Pascal programmers use OUTPUT as the internal name for the
main output text file of a program, in order to use abbreviated commands.
Assume that ININT is the internal name of a text file that has been opened
for reading. The command
READ(ININT,X)
can be used, when X is a variable of type REAL, INTEGER, CHAR, or BOOLEAN,
to take from the file a value of the proper type (expressed in characters)
and give it to X. If the file initially is ABC@_13_42↓, after
READ(ININT,I) the file would be ABC_13@_42↓, and I would have the value
13, as if I:=13 had been executed. (In reading from a text file to a real
or integer variable, initial spaces or carriage returns are passed over,
and the reading stops when the next character of the file could not be
part of the number being read.) If the file initially contains ABC@DE↓,
after READ(ININT,C) the file is ABCD@E↓ and C contains `D', as if C:=`D'
had been executed. If the file initially contains ABC@TRUE_DE↓, after READ
(ININT, B) the file is ABCTRUE@_DE↓, and B is true. It is an error to try
to read a file unless a value of the required type is next on the file.
The command READLN(ININT) moves the read mark past the next carriage return.
It changes ABC@DE↓FG↓ to ABCDE↓@FG↓.
Abbreviated forms for reading several variables are
READ(ININT,X1,X2,X3), meaning
READ(ININT,X1);
READ(ININT,X2);
READ(ININT,X3) and
READLN(ININT,X1,X2), meaning
READ(ININT,X1);
READ(ININT,X2);
READLN(ININT).
To test whether the next character of ININT is a carriage return, use the
test EOLN(ININT), which is, for example, true if the file is ABC@↓DE↓,
false if it is ABC↓@DE↓; EOLN stands for end-of-line. This test may be
needed while reading characters, to distinguish spaces from carriage
returns, because the READ command treats the carriage return as a space.
None of the above input operations is legal if there is an end-of-file
condition. To test for an end-of-file condition, use EOF(ININT), which is
true for ABC↓@ but false for ABC@↓. If there is the possibility of an
end-of-file condition, a program should check by EOF before trying any
other input operation.
If the text file from which reading is being done has the internal name
INPUT, the READ, READLN, EOLN, and EOF operations above can be abbreviated
to READ(X1,X2,X3), READLN(X1,X2) or READLN, EOLN, and EOF. Most
programmers use INPUT, the default internal name, for the major input text
file of a program, in order to use abbreviated commands.
Non-Text Files
If a file is not of type TEXT, it can hold values only of one single type.
Say INTNAME is a file of integers. Then integer values I1 and I2 can be
written on it by WRITE(INTNAME,I1,I2) when it is open for writing. When
it is open for reading, values can be read from it to integer variables I1
and I2 by READ(INTNAME, I1,I2). An end-of-file condition can be tested by
EOF(INTNAME). No other input or output operations can be done. Similar
restrictions apply to files of real, boolean, or character values. Files
of integer, real, and boolean values are not in a form suitable for
printing or for reading on the terminal screen; they are used primarily
for communication between programs, or between successive stages of a
single program.
Back to Files in General
A file may be opened for reading by the command RESET(INTNAME). It may be
opened for writing by the command REWRITE(INTNAME). It may be closed by
CLOSE(INTNAME).(?) At the end of a Pascal program, all its files are
automatically closed. At the beginning of a Pascal program, the files
INPUT and OUTPUT are automatically opened for reading and writing
respectively. Other files must be opened explicitly before any input or
output operations can be executed.
(?) If a Pascal program is stopped by intervention from the terminal, the
files may not be automatically closed. In this case, the TOPS-20 command
CLOSED may be used to close the files.
All files in Pascal except INPUT and OUTPUT must be declared as variables,
of type either TEXT or FILE OF t, where t is any type not itself
containing files. (For example, FILE OF REAL and FILE OF ARRAY [1..10] OF
CHAR). The names INPUT and OUTPUT are implicitly declared of type TEXT,
and should not be declared as variables.
Some files are used entirely within a single program; they are written by
that program, later are read during the same execution, and are then no
longer needed. Such /internal/ files do not need an external name. Input
files which provide data to the program from the user or some other
external source, and output files for printing or other subsequent use,
must have an external name. Such /external/ files must be listed in the
program header line. The header is of the form
PROGRAM P(INTNAME1, INTNAME2,INTNAME3);
where all internal names of external files are listed (including INPUT
and OUTPUT if they are used). When the program is executed, the
executed, the user will be asked for the external name corresponding to
each internal name.
It is possible for the program to ``see'' the next single element on
a file without actually moving the read pointer; if the file's internal
name is ININT, and the file contains e↓1 e↓2---e↓{i-1}@e↓i---e↓n, the
expression ININT↑ (or ININT∧ on some keyboards) has the value e↓i; to
discard characters from a text file up to but not including the next space,
for example, one could do
WHILE ININT↑<>`_' DO READ(ININT,C).
The read pointer can be moved one place to the right without reading, by
the command GET(ININT,C). In fact, READ(ININT,C) is an abbreviation for
C:=ININT↑; GET(ININT). The above example, then, could also be
WHILE ININT↑<>' 'DO GET(ININT).
Output files for printing should be limited to the width of the printer,
132 characters; that is, your program should execute WRITELN before
writing more than 132 characters. At most 60 lines will fit on a page;
the command PAGE(OUTINT) can be used to start at the beginning of a new
page, even if the old one is not full. (If PAGE is not used, a new page
will be used whenever a page fills up.) Similarly, output to be viewed on
the terminal should be at most 80 characters wide and (if you want to see
it all at once) 24 lines high.
File Cliches
To read and process every item of a non-text file, or every character of a
text file, with initialization before the first and finalization after the
last, the standard pattern in Pascal is:
BEGIN
RESET(f);(* not needed initially if f is INPUT*)
initialize;
WHILE NOT EOF(f) DO
BEGIN
READ(f,x);
process datum in x
END;
(* all data processed, at end of file*)
finalize
END
If the file is a text file and the data are not single characters, it is
much harder to write a correct program to process a file of numbers
terminated by the end-of-file. After each number, the program must check
each character position in turn for a null (space or carriage return)
or end of file. It must not try to test a character in the buffer, however,
when the end-of-file has been reached. The following program, in every
in every position of the file, first tests for end-of-file, leaving the
iteration if present; then tests for a null, discarding it if present. If
neither is true, it is correct to read and process a datum.
BEGIN
initialize;
(*RESET(f) if needed*)
WHILE NOT EOF(f) DO
BEGIN
IF (f↑ is null) THEN GET(f)
ELSE
BEGIN
READ(f,x)
process x
END;
END
(*RESET(f) if needed*)
finalize
END
where GET(f) moves forward one character position in file f, changing
AB@CD to ABC@D. If EOF(f) is true, GET(f) is an error.
*********GET already defined
Another alternate method uses a sentinel in the file to mark the end of
the data. A sentinel is a special value, like 999999, of the same type as
the data, and so can be read by the same READ that reads the data.
BEGIN
(*RESET(f) if needed*)
initialize;
READ(f,x);
WHILE(x is not a sentinel) DO
BEGIN
process x;
READ(f,x)
END;
(*RESET(f) if needed*)
finalize
END
The above program is more efficient, needing only one test per iteration,
but its structure is rather peculiar; each iteration processes the number
read on the previous iteration.
To process one line of characters on a text file, where every line must
end with ↓, and you know a line of data is present, so no EOF test is
needed:
BEGIN
initialize;
WHILE NOT EOLN(f) DO
BEGIN
READ(f,c);
process c
END;
(* next character is end-of-line*)
READ(f,c); (* or GET(f) or READLN(f) *)
finalize
END
To process every line of characters in a text file, with process A
initializing the whole computation, process B initializing the processing
of one line, process D finalizing the processing of one line, and process
E finalizing the whole computation:
BEGIN
A;
(* reset(f) if needed *)
WHILE NOT EOF(f) DO
(* process one line*)
BEGIN
B;
WHILE NOT EOLN(f) DO
BEGIN
READ(f,c);
process c
END;
READ(f,c);(* discard end-of-line symbol*)
D
END;
(* end of file*)
E
END
Example: a program to test that every line of file f contains at least
one asterisk:
BEGIN
(*RESET(f) if needed*)
EVERYLINE:= TRUE;
WHILE NOT EOF(f) DO
BEGIN
THISLINE:= FALSE;
WHILE NOT EOLN(f) DO
BEGIN
READ(f,c);
IF c='*' THEN
THISLINE:= TRUE
END;
READ(f,c);
IF NOT THISLINE THEN
EVERYLINE:=FALSE
END;
IF EVERYLINE THEN WRITE('EVERY LINE HAS A STAR')
ELSE WRITE ('NOT EVERY LINE HAS A STAR')
(*RESET(f) if needed*)
END
Example: Program to read a text file, and left-justify it with as many
words as possible on a line. The original line breaks are ignored.
BEGIN PROGRAM(---)
Declarations;
PROCEDURE PROCESS (X: CHAR);
BEGIN
IF X <> ' ' THEN
BEGIN
LETCOUNT:= LETCOUNT + 1; (*WORD LENGTH*)
WORD [LETCOUNT]:= X
END
ELSE
BEGIN
IF LINELENGTH + LETCOUNT + 1 > LIMIT THEN
BEGIN
WRITELN;
LINELENGTH:= 0
END;
IF LINELENGTH > 0 THEN
BEGIN
WRITE (' ');
LINELENGTH:= LINELENGTH + 1 + LETCOUNT
END
ELSE LINELENGTH:= LETCOUNT;
FOR I:= 1 TO LETCOUNT DO
WRITE(WORD [I])
END
END;
BEGIN (* MAIN PROGRAM *)
LETCOUNT:= 0;
LINELENGTH:= 0;
LASTC:= ' ';
WHILE NOT EOF DO
BEGIN
READ(C);
IF (LASTC <> ' ' ) OR (C<> ' ' ) THEN
PROCESS(C);
LASTC:= C
END;
(*RESET if needed*)
END
If running time is important is important, the call on PROCESS(C) can be
replaced by a slightly modified copy of the procedure body. The program
uses the fact that READ treats the end-of-line symbol as a space.
Pseudo-files
Pascal treats several entities more or less as if they were files. The
external name TTY: (note the colon) can be used to let the corresponding
internal name represent data typed at the terminal keyboard when in read
mode, and represent the terminal screen when in write mode. In write
mode, this is sometimes useful to see the results of a program immediately
as it writes them, especially if the lines of output are no more than the
80-character width of the screen. In read mode, the use of external name
TTY: can not be recommended; a program designed for input from a true
file is usually not well designed for keyboard input. See ______________
for details.
The internal name TTY can also be used to represent the terminal keyboard
and screen. Output commands to TTY, such as WRITE(TTY,X1,X2), are quite
analogous to output commands for files. Input commands, however, differ.
During the execution of a Pascal program, the sequence of characters typed
at the keyboard is used as if it were a text file called TTY for all input
operations. A line typed in is made available to the program only when it
is completed with a carriage return; up until entry of the carriage return
the user may modify the line at will, for example by backspacing. When
the program has consumed all available keyboard input, it stops and waits
until the user has typed another complete line. To avoid impasses, a
program should request more input before consuming all available input
characters (in particular, the final carriage return). The pseudo-file
with internal name TTY is initialized to an empty line, as if a single
carriage return had already been typed in, so that the program can
continue execution (Older versions of the translator required keyboard
input before the program would start). The pseudo-file TTY is not
mentioned in the program header, nor is it declared. It is treated as of
type TEXT. For input pseudo-files TTY or TTY:, an end-of-file condition
is created only by typing CONTROL/Z; usually programs intended for
terminal input do not use the end-of-file condition.
The external name LPT:( note the colon) can be used as a pseudo-file, in
write mode, for information which is automatically printed upon completion
of execution, and is not retained as a permanent file.
Terminal Input Cliches
Reading data as lines of characters from the terminal is typically done by
the program below.
(*No declaration needed for TTY*)
BEGIN
INITIALIZE;
WRITE(TTY, prompting message); (*Explain to the user what he must type*)
READLN(TTY); (* discard remnants of previous line; program waits here
until user completes a new line*)
WHILE NOT EOLN(TTY) DO
BEGIN
READ(TTY,C);
Process C
END;
Process carriage return, if required, without reading it.
END
Alternatively, a line can be read into a /string variable/ S, of type
ARRAY[1..80] OF CHAR; the program below tests each line for validity and
rereads until an acceptable line has been read.
BEGIN
WRITE(TTY, prompting message);
BADDATA:=TRUE;
WHILE BADDATA DO
BEGIN
READLN(TTY); (*Discard remnants of earlier line*)
READ(TTY,S); (*Puts in S the entire line except carriage return*)
IF(S is acceptable) THEN
BADDATA:=FALSE
ELSE
WRITE(TTY, reprompting message)(*Explain exact requirements
for correct input*)
END
END
The Naming of Files
Like the naming of cats [1], the naming of files should not be undertaken
lightly. See the section of the LOTS Overview on file descriptors [ ] for
the rules about directory names. The extension field of your program
should normally be PGO, for translation by the PASSGO translator. If your
program uses a separate library of subprograms, use the PASCAL translator
by making PAS the extension field. The file name proper of your program
for a course assignment should begin with an identification of the
assignment number. A program for assignment 8, to do an integration,
might be in file P8INT.PGO. The data file for a program should have the
same name except for extension field DAT (e.g., P8INT.DAT); the output
file should have the same name except for extension field OUT (e.g.,
P8INT.OUT). An interactive program should keep a permanent record of all
input, perhaps on a file with extension field LOG; this allows later
confirmation that data were entered correctly.
It is a common disastrous error to give the name of the program as the
external name of the output file; this results in deletion of the program
file. When this happens, (1) DO NOT LOGOUT, (2) DO NOT EXPUNGE, until
the deleted file has been recovered. See Floyd's notes, Appendix C, for
methods to locate and restore deleted files. As a safety measure, you can
set the normal number of file generations retained in your directory to 2.
(E.g., if your program is in file P8INT.PGO.10, and you send output to
P8INT.PGO, it will go to P8INT.PGO.11, and generations 10 and 11 will
both be retained.) If you do so, you should delete all obsolete files
at the end of each terminal session.
FILES[1, rfn]